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Abstract 

In the paper we address the problem of finding the most probable state of discrete Markov random 
field (MRF) with associative pairwise terms. Although of practical importance, this problem is known 
to be NP-hard in general. We propose a new type of MRF decomposition, submodular decomposition 
I (SMD). Unlike existing decomposition approaches SMD decomposes the initial problem into subprob- 

■ lems corresponding to a specific class label while preserving the graph structure of each subproblem. 

^ I Such decomposition enables us to take into account several types of global constraints in an efficient 

Q ■ manner. We study theoretical properties of the proposed approach and demonstrate its applicability on 

a number of problems. 
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1 Introduction 



The problem of effective Bayesian inference arises in many applied domains, e.g. in machine learning, 
computer vision, decision-making, etc. One of the most intriguing problems is the development of approx- 
imate inference algorithms for the problems that are NP-hard in general. An important particular case is 
the inference problem in cychc discrete Markov random fields (MRF) with energies that can be represented 
via sum of unary and pairwise terms. 

Let G = (V, £") be an undirected graph with V and £ being the sets of nodes and edges respectively. 
^ ■ With each node we associate a class label tj €{!,..., P}. The inference problem can be formulated as an 

■ energy minimization problem 

O-eV) i^,j)ee ti,...,<|v|e{i,...,P} 

where 9j {tj ) and 9ij {ti , tj ) are some known functions of discrete argument. 

Although NP-hard in general there are several special cases when the inference problems can be solved 
exactly in polynomial time. One example is dynamic programming approach [2 1 ] for inference in tree- 
structured graphs. Another example are MRFs with outer-planar graphs [23], or more generally MRFs with 
low treewidth [ ]. Min-cut/max-flow algoiithms can efficiently solve the inference problem on arbitrary 
graphs when all the variables are binary and pairwise potentials meet submodularity constraints [8, 15]. 

0y(O,O) +%(1,1) < ^,,(0,1) + ^,,(L0). (2) 

The multi-class problems on cyclic graphs do not have efficient exact algorithms except few very special 
cases with specific ordeiings in the space of classes ['\ ^^]. 

Recently advanced approximate methods based on MRF decomposition have appeared [24]. The most 
popular method tree-reweighted message passing (TRW) [29, 13, 17] splits the MRF with cycles into a 



*The authors assert equal contribution and thus joint first authorship. 



number of acyclic subgraphs (trees), for each of them inference is made independently with the subsequent 
harmonization of optimal solutions. The other decomposition methods (e.g. [30, 25, 16, 1]) exploit similar 
ideas. Unlike more efficient approximate energy minimization algorithms, e.g. [5, 18], decomposition 
framework makes it possible to take into account some global properties of the solution, e.g. to establish 
constraints on the areas of classes [31, 32, 20]. 

In this paper we develop an alternative framework for approximate inference in MRF with associative 
pairwise terms. We present submodular decomposition (SMD) technique. Instead of subgraph-based de- 
composition we split the problem into a number of submodular problems keeping the structure of graph Q. 
To harmonize optimal solutions of different subproblems we use the dual decomposition approach [ :]. We 
show that SMD provides energy lower bounds which are equivalent to the ones by TRW. However the new 
type of decomposition allows us to take into account the preferences on any type of global linear statistics 
of the class indicator variables in a straightforward manner as well as to establish some shape constraints. 

The rest of paper is organized as follows. In the next section we present SMD framework. We prove the 
equivalence of SMD and TRW lower bounds in section 3 and examine the properties of SMD convergent 
point in section 4. Section 5 presents a possible way for optimizing the lower bound which is based on the 
averaging of min-marginals. The different types of global constraints which can be implemented efficiently 
into SMD framework are discussed in section 6. We present the experimental evaluation of our method in 
section 7 and give some conclusions in section 8. 

2 Submodular Decomposition 



Ujp 



Consider the indicator parametrization of (1) obtained by establishing auxiliary binary variables Y = 

{%p}e{0,l}l^|x^: 

^1, tj=p, 

0, otherwise. 

Denote Oj{p) = 9jp and 9ij{p, q) = Oij^pg = Oji^qp. Then the problem (1) takes the form 

p p 
E{Y) = ^Y^ OjpVjp +^ Yl ^ij,P1y^pyjQ yfcng ' 

jeVp=l (ij)g£-p,q=l 

Here we denoted the set of local constraints by £ = |y | yjp E {0, 1}} and the set of global constraints 
by ^ = {y I J2p=i Vjp = 1) W S V}. The first group of constraints requires yjp to be indicator variables 
while the second group provides consistency of graph labeling, i.e. each node belongs to one and only one 
class. 

In what follows we assume the associativeness [26] of pairwise terms, i.e. 

^ijtPQ — ^ij^P^PQi 

where 5pq = 1 iff p = g and Cij^p are non-negative constants! Then we may rewrite the energy (3) as 
follows 



min EiY) 



p 

min y ( y 

p 



min S2^p{Yp)> mm J]EpiYp)=y] min EpiYp). (5) 



Note that by setting all Cij,p — dj we get an important case of generalized Potts potential. 
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Here we denoted the p^^ column of y by 1^ and (i, j) is an undirected edge of the graph. Note that each 
EpiYp) is a submodular function so that we can find miny ^||vi EpiYp) efficiently by min-cut/max-flow 
algorithm (e.g. [ ]). To reconcile the minimizers of Ep consider a Lagrangian 

L{Y,K) = Y,Ep{Yp) + Y,^i \ Y.y^v-^ ■ 

p=i jev \p=i / 

The following lower bound is valid for the initial problem (3): 

mill E{Y) = min LiY.K) > maxminLfy, A) = 

ye£ng Y&cng A Yec 



max 

A 



^^Fp ™o5}ivil ^^^^^^ +T.^jy^p] -E^^- 



jev / jev 



(V) 



Note that the minimization of 

^piYp,A) = Ep{Yp) + ^X 

jVjp 

w.r.t. Yp can still be done by min-cut algorithms since the functions remain submodular. The maximization 
problem (7) is concave w.r.t. A and hence the solution can be found via subgradient ascent. Thereby we 
replace NP-hard optimization problem (3) with a series of simpler problems that can be solved efficiently. 
Hereinafter we refer to both decomposition and algorithm as submodular decomposition (SMD). 

Note, that miuY^cng E{Y) = minyg£ maxA L(Y, A) and energy lower bound (6) equals the maximin 
value of Lagrangian maxA minyg£ L(y, A). Now consider the continuous relaxation Q = {Y\yjp G 
[0, 1]} of local constraints. It can be shown [^^, Theorem 3.1] that uiinYeCng E{Y) = minyggng E{Y) 
for arbitrary number of classes, unary and pairwise potentials, hence 



min ^JYr,,A)= min ^JY„,A). 



This implies 



L* = min E(Y) = minmaxLfy, A) > maxminLfy, A) = maxminLfy, A) = L*. (8) 

Yecng YeQ a ' ' A vec A YeQ 

In the space (Qna)xIRl^ltheLa grangian (6) is smooth and differentiable function, hence we can find the 

global minimum of energy (3) by performing SMD if and only if L{Y, A) has a saddle-point in (Q n ^) x 

toIVI 



3 Equivalence of Lower Bounds 

In this section we show that the optimal value of Lagrangian dual (7), which is solved by SMD, is equivalent 
Kleinberg-Tardos (KT) relaxation [10] of uniform metric labeling problem. Furthermore we prove that 
KT-relaxation is equivalent to Schlesinger LP-relaxation which is reviewed in [ ] and is solved e.g. by 
TRW [29]. 

Theorem 1 The maximin value the Lagrangian (6) equals the minimum value of KT-relaxed problem [10]: 

P ^ P 

= rnin ^ ^ (^jp^jp + - ^ ^ Cij^pXij^p, (9) 

P 

S. t. ^ ^ Xip — 1 , 

p=l 

^ '^ip •^3pi '^'ijip — ^3P •^tpi 
'^ij,p — '^ip — 0" 
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See the proof in appendix A. 

Theorem 2 If energy (3) satisfies the condition (4) then the KT-relaxation [10] is equivalent to the 
Shlesinger LP -relaxation: 

p p 
L^, = mill ^ ^ + ^ dij^pgZij^pg, (10) 





{i,j)e£ p,'i=^ 


• ^ ^ Zip ~ 1 ) 




p=l 




P 


P 






p=i 




Zij^pq ^ 0, Zjp 


>o, 


Otherwise. 





See the proof in appendix B. 

4 Properties of the Maximin Point 

Consider the properties of L{Y,A) when Y £ C, i.e. all yjp S {0, 1}. Then the maximin problem (7) 
can be viewed as maximization of a piecewise linear function minyg£ L{Y, A) in the space of A's. Each 
Y £ C defines a hypeiplane in this space and Q, = {{I, A) | / < minyg£ L(Y, A)} is a concave polytope. 

Lemma 1 Let (y'^, A'^) be a maximin point of the Lagrangian, i.e. L(Y^,A^) = maxA miny££ L{Y, A). 
The global minimum of energy E{Y) equals L{Y^,h9) if and only if there is a horizontal hyperplane 
defined by Y^ such that L{Y\A'^) = L{Y°, A°). 

If it is known additionally that subgradient o/ minyg£ L(y, A) at the point A^ consists of only one 
vector then Y^ is the optimal solution of (3). 

See the proof in appendix C. 

Since minyg£ L{Y, A) is a piecewise linear function three cases are possible at its maximum (see fig. 1). 
In the first case there is a plateau on the top of the polytope. In the second case there is no plateau but there 
exists a horizontal hyperplane defined by Y^ that touches polytope Q,. The third case corresponds to the 
non-zero duality gap. Note that in the first case we get the optimal value of Y^ = arg miuYecng EiY) 
whereas in the second case in general we may not get the optimal value of £ Q in an explicit manner. 
Several cases when it becomes possible are considered further. 

Now we define and investigate strong agreement (SA) and weak agreement (WA) conditions that are 
analogous to strong tree agreement (STA) and weak tree agreement (WTA)? 

Denote the possible optimal labelings of the node j in subproblem p given A as 

Z.piA) = {z\3Yp* = (yip, . . . : y*p = z, %{Y;,A) = m\n%{Yp,A)}. 

Definition 1 Tlie variables A satisfy strong agreement condition if 

Vj 3!p : Z,p{A) = {!}, / p : Z,,(A) = {0}. 

From the last definition it follows that A" satisfies strong agreement if and only if 

argminyg£L(y, A°) G Q. 

Theorem 3 If there exists Js9 that satisfies strong agreement condition and Y^ = argminyg/; L(Y., A9), 
then E{Y^) = minyg£n£; E{Y). 



^WTA and STA were defined in [ 14] for TRW framework. In this paper we explore similar conditions for SMD. 
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mmi-g£L(y,A) minygc A) mmYecL{Y,A) 




Figure 1: Three possible cases in SMD dual problem. The optimal energy value is shown by dotted line. 
The left plot corresponds to the case when we are able to find both optimal energy and optimal configuration 
of labels. The middle plot shows the case when there is zero duality gap but we are not guai^anteed to find 
the optimal configuration of labels. The horizontal face corresponds to (see lemma 1). The right plot 
illustrates the case when there's no horizontal face coming through the maximin point. Then we observe a 
non-zero duaUty gap. 



Definition 2 The variables A satisfy weak agreement condition if for all j both statements hold true: 
2. If3p : Zjp{A) = {!}, then / p ^ e Zjq{A); 

It is easy to see that strong agreement condition implies weak agreement condition. 

Theorem 4 If the point {A^, Y^) is a maximin point of L{Y, A) then variables A^ satisfy weak agreement 
condition. 

Proof. Assume the contrary. Let {A^,Y^) be a maximin point. There are two possible cases: there exist j, 
p, and q such that Zjp{A^) = Zjq{A^) = {1}; there exists j, such that for all p it holds Zjp(A°) = {0}. In 
the first case we may always take A^ such that for all i ^ j holds Xj = A? and = + where e > 0. 
If e is small enough {1} still belongs to both Zjp{A^) and Zjq{A^). Hence Y^ e Arg min L(y, A^) but 
L{Y^,A^) > L{Y^, A°) that contradicts the condition that (A'^, Y^) is maximin point of the Lagrangian. 
The second case is considered in a similar manner. ■ 
Let Free{p, A) be a set of nodes such that Zjp{A) = {0, 1}, and let Free^{p, A) be its k*^ connected 
component. 

Theorem 5 Let Y* = arg minyp ^p{Yp, A). Consider Yp^'^ such that 



(i)_Jl, i€Free(p,A), 
'ip' 



^"'^ 1 otherwise 



and YjP^ such that 



Then the following is true 



(i)^|0, j€Free{p,A), 
"'^ I yL, otherwise. 



%iYi'\A) = %(Yi'\A) = %(Yp*,A). 



The proof follows from submodularity inequality for the function Ep{Yp). 

Note that the last theorem can be trivially generalized to separate connected components. In each 
Free'^{p, A) we may replace y*^ with either all ones or all zeros regardless of other Free\p, A) and still 
keep ^p{Yp, A) at its minimum. This is an important property since it helps to hamionize the solutions of 
subproblems. 
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Theorem 6 Assume that satisfies weak agreement condition. Then for any e > there always exists 
such that: 

• ||A° - A^ll < e; 

• A^ also satisfies weak agreement condition; 

• minyg£L(y,AO) = minyg£L(y,Ai); 

• If for any j there is a subproblem p such that j G Free{p, A^) then 

— there exists at least yet another subproblem q such that j € Free{q, A^); 

- for all problems r such that j Free{r, A^) it follows Z-,>(A^) = {0}. 

Proof. We provide here a sketch of the proof. 

At the point A*^ the Lagrangian has either a plateau or a peak w.r.t. each variable Xj. In case of plateau 
if Xj is on the border then take Xj it e in order to be inside the plateau. At each interior point of the plateau 
Zjp{A^) consists of a single element for each p. 

If there is no plateau then has a jump of at least 2 at the point A'^. This means that there are at 



least two subproblems p and q such that Zjp{A^) = Zjq{A^) = {0, 1}. 

If for A'^ there exists r such that Zjr = {1} then set A j = + e. Zjr{A^) still contains {1} while for 
all other subproblems Zjq{h}) consists of only zeros. ■ 

Without loss of generality we may consider only those A^ that satisfy the last theorem. Then the 
following theorems hold. 

Theorem 7 Assume that A" satisfies weak agreement condition and = argminyg£ L{Y, IsP). If there 
exists such subproblem p that for all q it follows that Free{q, A^) C Free{p, A^) then setting 

1, j G Free{r,A^), r = p; 
0, j S Free{r, A^), r ^ p; 
^yj^, otherwise, 

we get an optimal configuration Y* = arg minyg^ng E(Y). 

This theorem is a corollary of theorems 5 and 6. Note that it can be straightforwardly extended to the 
case of separate connected components. 

Theorem 8 Assume that A^ satisfies weak agreement condition and Y^ = argminyg£ L{Y, A^). If there 
exists a partition Free^^ (pi, A*^), . . . , Free''"' {pm,A^) of the set I = {j\3p : Zjp = {0, 1}} such that 
Free^'(pi, A°) n Free'^^ {pj , A'^) = and I = |Jj Free'^'(pj, A°) then an optimal configuration Y* = 
arg minyg£ne -E'(^) can be constructed as follows 

y'ir, j ^ I; 



1, r = Pi, j G Free'^'(pj, A°), Vi = 1, 
0, otherwise. 



Proof. For each i we apply theorem 7 to Free''' {pi. A"). ■ 
The last theorem allows us to get an optimal solution of initial problem in the some cases when duality 
gap is zero but there is no plateau in the top of a polytope defined by minyg£ L{Y, A) (case 2 in lemma 1). 
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5 Min-Marginals Averaging 



In this section we derive an alternative approach to the optimization of SMD lower bound based on the 
min-marginals averaging. First we prove an important lemma. 

Suppose that for each of our subproblems we have estimated unary min-marginals 6jp{k,A) = 
m.m^Yp\yjp=k} ^pO^p: This can be done effectively e.g. using dynamic graphcut framework [1 1]. Now 
consider the following re-estimation equations for Lagrange multipliers 

X]^"" = Xf + AXj, (11) 



where AXj 



i,p(0,A°''^)-%p(l,A°'^) . 

Theorem 9 The iterative process defined by {W) has the following properties: 

• The lower bound of the Lagrangian does not decrease at each step; 

• The fixed point of the process satisfies weak agreement condition. 
The proof is in appendix D. 

Theorem 10 If = 0/or some point hP then hP is a maximum point ofLiY, A) w.r.t. Xj given all other 
X's are fixed. 

Proof. The condition AXj = implies maxp(^jp(0) — 9jp{l)) = 0. If we increase Xj all yjp are assigned 
the value of and hence the Lagrangian decreases. Similarly the decrease of Xj implies the decrease of the 
Lagrangian since there is at least one subproblem p for which t/jp = 1. ■ 
From the last theorem it follows that unless A" satisfies weak agreement condition, among the nodes 
with AXj 7^ there always exists at least one node i such that changing Aj by AAj will definitely increase 
the value of the Lagrangian. The question of how to find such i is easy to answer. Any AXj < leads to the 
increase of L{Y, A). If all AXj > then we should select i for which there exist at least two subproblems 
p and q such that 

\eip{0,A)>9ip{l,A) 

The update of such increases the value of the Lagrangian^^ However we found the process of min- 
marginals averaging to be too slow in practice even with the use of dynamic graphcuts [ . ]. So in our 
implementation we mostly used subgradient ascend switching to min-marginals averaging for one iteration 
when the lower bound started oscillating. 

The possibility of min-marginals averaging and weak agreement condition shows that there are lots of 
similarities between various types of decomposition as well as optimization on trees and optimization of 
submodular functions on cycled graphs have much in common. 



6 The Inclusion of Global Terms 

The submodular decomposition allows us to make use of its form by including several types of the global 
constraints related to classes. In particular we consider all linear constraints of indicator variables of the 
following form 

p 

T.T.<vyw = c'"^ m = l,...,M (12) 

jevp=i 
p 

EE^jp^iP^^''' k = l,...,K. (13) 

jGVp=l 

^If it is not possible to make an update according to ttiese rules than we obtain a strong agreement situation. 
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Then p^^ subpioblem takes the form 

M K 

<^p{Yp, A, M, K) = Ep{Yp) + ^ \jyjp Wj^Vjp + X] X] ^ 

jgV m=l jeV A:=l jgV ^ 

The subproblem is still submodular. The maximization of 

P M K 

mill L(y, A, M,K) = Y^ mill $p(yp, A, M, K) - ^Xj fi^mc"" - K^d^ 

p=l ^ jeV m=l k=l 

w.r.t. Lagrange multipliers A, M, and > can be done e.g. via subgradient ascent since the function 
is piecewise linear and concave. Note that each global constraint adds just one additional variable in the 
Lagrangian. 

Here are several examples of global linear constraints: 

Ujp = c, Strict class size constraints (14) 

iev 

Ujp G [ci, C2], Soft class size constraints (15) 

^ yjp-^j = ^ Flux equalities (16) 

iev jev 

f EjGV Vip^i = M Eigv Vip^ Color mean/ 

1 Ejev yjp(^i - (-^i - flV = Ejev ^^p^' variance. 



(17) 



Here Ij is an observable scalar or vector value associated with node, e.g. it can be the color of an image 
or the intensity of a grayscale image. 

Such global linear statistics cannot be used in the state-of-the-art a-expansion [5] method since it does 

not provide a lower bound for energy and their inclusion in TRW-like algorithms results in the following 

problem , 

LTRw{y, 0) - >^ /^mc™ - >^ Kkd^ max (18) 

s.t. ^/i?^ = t? 

T 

^jp = ^jp + Y t^-r^^Tp + X] '^^'^ip 

m k 
'^ij,pq ^ij,pqj ^k — 0' 

Here T indexes the set of subgraphs that are used in TRW decomposition, L{Y, ©) is sum of optimal en- 
ergies computed on each of the subgraphs, and = defines reparameterization relation. The straightforward 
way to solve the problem leads to nested iterative process. On the inner loop we maximize Ltrw given '& 
and on the outer loop we update M and K by making subgradient ascent. We will refer to this algorithm 
as GTRW. One could use a linear programming relaxation of (18) but then it's not clear how to solve it 
efficiently in the case of general linear constraints such as (14)-(17) (invoking a general -purpose LP solver 
may be too slow in practice). 

Another promising option of SMD is its ability to take into account some shape constraints. In particular 
we may establish all shape constraints that can be achieved by binary graphcut algorithm, e.g. star-shape 
prior [27] and "boundary position" prior [ /] for any number of classes. 
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SMD 

GTRW 

MPF 



0246 0246 

(a) (b) 

Figure 2: Results on a synthetic test of SMD, GTRW, and MPF. (a): solid lines correspond to lower bounds 
of different methods, dashed lines show energy of primal solution (that violates global hard constraints) at 
each iteration; (b): the total constraint violation of primal solution (measured in percentage of pixels that 
has to be recolored to make constraints consistent). On both plots horizontal axes show time measured in 
seconds. The red circles on both plots show the result of TRW-S algorithm as a "starting point" of GTRW 
and MPF. 



7 Experiments 

Synthetic problems. We compare the convergence rates of different methods on a number of artificially 
generated problems with linear constraints. Similarly to [13] as a synthetic setting we take a 10-label prob- 
lem on a 50 X 50 grid graph with 4-neighborhood system. Unary potentials are generated as independent 
gaussians: 6ip ~ M{0, 1). Parameters Cjj p = Cij of pairwise potentials are generated as an absolute value 
of M{0, 0.5). As global constraints we've chosen strict class size constraints (14). Class sizes are deliber- 
ately set to be significantly different from the global minimum of unconstrained energy (1). Particularly, 
for class p we set its size to be \V\p/ J2q=i 1- 

Besides SMD we evaluate the performance of GTRW with TRW-S [1 3] as an inner iterative process. 
The third method is based on MPF framework proposed in [ ]. Let be the lower bound of en- 

ergy (1) with unary potentials ^jp = 6jp — -Kjp and pairwise terms i?ij,pg = O^ pg-, <I>G(n) = miny (IT, Y) 
s-t- J2j Vjp = Cp > 0, Cp = Y e Qr\Q. Then we can maximize ^>L(n) + $G(n) w.r.t. 11 via 
subgradient ascent. We computed <&L(n) using TRW-S algorithm and ^g{^) using simplex method for 
transportation problem. 

Figure 2 shows the convergence of SMD, GTRW, and MPF averaged over 50 randomly generated 
problems. For MPF we do not consider the time required to solve the transportation problem $G(n) since 
it is highly dependent on the effectiveness of the implementation. Plot (a) shows lower bounds and energies 
of primal solutions^ of all three methods, (b) shows the reduction of the total constraint violation of the 
cun^ent solution. Note, that energy of primal solution can be lower than the lower bound since it, generally 
speaking, does not satisfy constraints. Moreover, in GTRW and MPF the energy of the current solution 
increases since both methods start from energy minimum found by TRW-S and try to make the solution 
consistent with constraints. Based on figure 2 we conclude that all three methods converge to the same 
point but in SMD lower bound converges faster and hence primal solution with both low energy and small 
constraint violation can be found. 

Magnetogram segmentation. We demonstrate that SMD can be used for implying flux constraints. 
Figure 3a, b show photos of the Sun in ultraviolet and magnetogram specters. In the regions of increased 
sun activity (Figure 3c) the amplitude of magnetic field is significantly larger (large positive magnetic fields 

■* The choice of primal solution is always an heuristics in dual decomposition methods. In SMD all pixels with conflicting labels 
in the same connected component were assigned to the same randomly selected label from the set of conflicting labels. In GTRW 
primal solution was chosen as TRW-S primal solution. In MPF primal solution was chosen as a solution of local subproblem 
$L(n) solved by TRW-S (solution of global subproblem $G(n) always satisfies global constraints but typically has much higher 
energy). 
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Figure 3: Results on a magnetogram test, (a) - the picture of the Sun at 195 angstrom wavelength in 
the extreme ultraviolet; (b) - corresponding magnetogram: white regions correspond to the areas with 
large positive magnetic field, black regions - to large negative magnetic field, gray regions - to relatively 
inactive areas; (c) - (b) - the increased region of high activity; (d) - the result of a-expansion without 
global constraints: positive flux = 83794, negative flux = -71021, diff = 12773; (e) - the result of SMD with 
flux equality constraint: positive flux = 70972, negative flux = -67341, diff = 3630. 

are shown in white while large negative fields are shown in black). The greater is the deviation of pixel 
intensity from the gray level, the larger is the amplitude of magtetic field. The segmentation of sun's 
surface into quiet regions and the regions with large positive and negative magnetic fields and the analysis 
of their mutual allocation is important for further forecast of sun flares. It is known that for stable sunspots 
the total positive flux approximately equals the total negative flux in some vicinity of a sunspot. This 
constraint is a particular case of (16) and hence can be taken into account by SMD. Figures 3d-e shows 
how the segmentation changed after we have added flux constraints to our energy. Note that the major 
changes happened in the regions where the amplitude of magnetic field was relatively low and therefore 
those regions could be treated as quiet ones. 

Image segmentation. SMD framework gives an opportunity to take into account any constraints that 
can be worked out with a single binary graphcut. As an example we present our results of including star- 
shape prior [ ] into SMD framework for image segmentation. We work in a framework that can be viewed 
as a multilabel version of [3], i.e. user defines a small seed region for each class which we use to collect 
color statistics and to establish the center for star-shape prior. 

As unary potentials we use standard — logP(/j | Xj) pixel. Color statistics is collected from a small 
seed region that is provided by user for each class. As pairwise potentials we use generalized Potts Model 

Cij = ai + 02 exp (^— ^^^J'P ^ ■ We use ai = 2, 02 = 20; for each edge we set a to be the absolute average 
intensity difference in the box corresponding pixels. The box size is set to be 20 by 20. 

A star-shape prior is a compact shape priors that given a center c makes an object to be star-shaped 
around the center, i.e. on any ray starting at c the "object" label 1 cannot appear after the "background" 
label 0. It can be written down via pairwise potentials in the following way: 

[0, if/. = /,-, 

Sij = < 00, if /i = 1 and fj = 0, 
(/3, if/, = Oand/j = l, 

where j is between c and i. Parameter /3 corresponds to the "ballooning force" and is set to in our 
experiments. For more details on star-shape piiors see [27]. 

Figure 4 presents our results. Note that TRW-S and a-expansion converge to similar almost optimal 



10 




(a) (b) (c) (d) 



Figure 4: Results for the image segmentation problem, (a) - TRW-S and a-expansion results; (b) - the 
results of independent usage of binaiy graphcut with star-shape prior for each object; (c) - the result of 
SMD with star-shape prior; (d) - the result of SMD with both star-shape prior and class size equality 
constraints (all classes except background should have equal sizes). 



solution but it has poor quality unless we take into account global constraints. This effect is a result of 
mixed color statistics in unary terms. 

8 Discussion and Future Work 

In the paper we presented a novel approach to approximate Bayesian inference in associative MRFs with 
cycles based on the decomposition of initial NP-hard problem into a number of submodular problems. The 
advantage of such approach is the fact that each subproblem is related to a specific class. Hence any global 
statistics of classes can be added directly to the subproblems rather than by organizing iterative process 
which requires full MAP-inference on each iteration. Some questions still remain open: 

• Is it possible to derive an efficient algorithm for min-marginal averaging? 

• How the value of the gap between minimax and maximin depends on the number of global con- 
straints? 

• Are there any partial label optimality guarantee for partially coincident subproblems solutions? 

Two directions of further research work on SMD should be mentioned. We plan to adapt the SMD frame- 
work for a wider class of pairwise terms (e.g. for the cases when pairwise terms define semi-metiics in the 
space of class labels) and to add more elaborate shape priors as global terms. 
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Appendix 

A Proof of theorem 1 

In this section we prove that the SMD decomposition is equivalent to the LP relaxation of Kleinberg and 
Tardos (9): 

p ^ p 

maxminL(y,A) = inin ^ ^ + - ^ ^Cij^pXij^p 

p 

s.t. ^ Xjp = 1 Vj G V 

p=i 

Xij,p > Xip - Xjp, V(i j) e £,p e {1, . . . ,P} 

X ij 5 j'P ^ 



Here we define (V, £) to be the directed graph corresponding to the undirected graph (V, £), i.e. £ = {{i 
j), {j i)} I G £}■ We can rewrite the Langrangian (6) as follows: 

p ^ p ( ^ \ 

L{Y, A) = X] ^ipViP + 2 X Cij,p\yip - yjp\ + Aj ^ yjp - 1 
jeVp=i («j)e^^P=i jey \p=i / 



The minimum of the Lagrangian over Y G C equals 
^>(A) = minL(y,A) 



(1) . 
= mm 



P ^ P ( ^ \ 

jGVp=i (i^j)g£P=i jev \p=i J 



s.t. yjp<l ViGV,pG{l,...,P} 

yij,p > Vip - Vjp V(i ^ j) G G {1, . . . , P} 

Vjp > 0, yij^p > 

max -YJ2l-'jP~12^i 

j£Vp=i jev 

+ Aj Vi G V,pG {1,...,P} 

fJ'ij,p < ^Cij^p y{i^j),P 
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where (1) uses a standard relaxation of the s-t min cut problem (which is known to have no duality gap), 
and (2) uses LP duality. The maximization problem maxA ^(A) is now a linear program; its dual is 



$* = max$(A) 



A 

P , P 

mm 



s.t. xjp<l VjGV,pe{l,...,P} 

a^jj.p > - Xjp V(i j) e £,p e {1, . . . ,P} 

p 

p=l 

jj) 5 *^ ij jj) 

which is the LP of Kleinberg and Tardos. 

B Proof of theorem 2 

This section shows that the LP of Kleinberg and Tardos is equivalent to the Shlesinger LP in case of Potts 
model. 

Given two vectors x,y e [0,1]^ with J2p=iXp = Ylp=iyp = 1 non-negative constants Cp, 
p = 1, . . . , P, define 

p p 

f{x,y) = min ^ ^ dp^Zp, (19) 

p=l q=l 
P 

S.t. ^ Zpg = Xp (20) 

9=1 
P 

^ Zpq = Vq (21) 

p=l 

Zpq > (22) 



where dpq = if p = q, and dpq = ^ ^ '' otherwise. It suffices to show that f{x, y) = ^ J2p=i Cp\xp- 



■yp\ 



Lemma 2 Denote = m.m{xp,yq}. There exists vector z £ [0,1]^^^ with Zpp = Zpp satisfying 
constraints (20), (21), (22). 

Proof. Consider the following algorithm for constructing z: 

50 Initialize: Zpp = z^, Zpq = for p / 

5 1 While there is a pair of labels (p, q) with Xp > X^^^^ Zpg/ and yp > J2p'=i ^p'q^ ^^e following: 

• compute 6 = min - Zpq/,yp - X]^=i Vg} 

• set Zpq • — ^PQ ~^ ^ 
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Clearly, the algorithm maintains the following invariants: 



p p 

— ^ ^ ^pq' Uq — ^ ^ ^p 

q' = l p' = l 



Furthermore, components of vector z never decrease, and after the update for (p, q) we have either Xp = 
'}2,^'=i ^pq' or Vq — J2p'=i ^p'q- Thus, the algorithm terminates in a finite number of steps. It can be seen 
that upon termination 

p p 

Xp = ^ ^ Zpq' Vq — ^ ^ Zp/q 

q' = l p' = l 

for all p and q. Indeed, suppose, for example, that upon termination Xp > J2q'=i ^pq' foi' some p, then there 
must hold Ug = J2p'=i ^p'q f^i' 9' othewise the algorithm wouldn't have terminated. Then 

p p p p p 

l = Y^Xp>^^Zpg> = ^^Zp,g = ^yg = l 

p=l p=l q' = l q=lp' = l q 



a contradiction. 

It remains to note that during the algorithm = -z^r foi" ^ ^ {Ij • • • ) ^} since either Xr = J2q=i ^rq 



or Vr — Xyp=l ^pr- 

Let z be the vector constructed in the lemma, then 



p p 
EE": 

p=l 9=1 



pqZpq 



I I ^ P P \ 

- I CpXp + Cql/g — 2 CrZrr 1 
\p=l q=l r=l / 

1 / ^ P P \ 

- ( Cr mm{xr,yr} + C'r max{xr, Ur} — 2 Cr min{xr., yr} j 

\r=l r=l r=l / 

- I ^ C'r max{xr, yr} - C'r min{.Xr, yr}\ = ^ y^Cr 

\r=l r=l / r=l 



Let z' be another vector satisfying (20), (21), (22). We must have z'^.^ < z^r = min{xr, yr}> therefore 
pp I ( ^ ^ ^ \ 

f^pg^pg = 2 CpXp + ^ Cg^q - 2 y^ Cr4r j 

p=l q=l \P=1 9=1 '"=1 / 

I P P P \ PP 

— 2 I 5Z ^P^P + ^ ^9^9 ~ ^ ^ Cr^rr j = ^ C^pg^pg 
\ p=l (7=1 r=l / p=l q=l 



This shows that /(x, y) = ^ X]r=i C'rl^/r — Xr\, as claimed 



C Proof of lemma 1 

Since minyg£ L{Y, A) is a piecewise linear function three cases are possible at its maximum (see fig. 2). 
In the first case there is a plateau on the top of the polytope. If subgradient consists of only one vector then 
the function is differentiable at the point A'', i.e. A'' is located somewhere inside the horizontal face that is 
defined by argminL(y, A'^) = Y^. But this means that L{Y^ , A) does not depend on A, i.e. € Q. For 
all y € a holds A) = E{Y) and minyg^ng L{y, A) = m.\uY&cng E{Y). 
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Consider the second case when there is no plateau but there exists a horizontal plane which passes 
through the point (A*^, L{Y^, ^^))- Let be the value of y G £ which corresponds to that plane. Since 
the plane is horizontal ^ Q as well. Then it holds 

L(y°,A°) = L(y\A°) = min L(y,A°)= min E(Y). 

Yecng Yecng 

Now assume that L(y°, A°) = mmyecng E{Y) = E{Y^). On the other hand minyg£ne -E'(y) = 
^(yi) = L(yi, AO). Hence L(yO, A^) = L(yi, A^). From that it follows that there is a plane which 
con^esponds to Y^ passing thi^ough the point (A^, L{Y^ , A^)). Since Y^ Q the plane is horizontal and 
we have either the first or the second case. 

Finally there can be the third case when there's no horizontal plane at maximin point. In this case a 
non-zero duality gap takes place and we can only get the lower bound for optimal value of energy E{Y). 



D Proof of theorem 9 



In this section we prove that min-marginal averaging process defined by (11) if monotonic and satisfies 
weak agreement condition at the fixed point. 

Lemma 3 Every submodular function 



Xi , Xj j 

(jev) {i,j)&£ 
where Xj G {0, 1} can be reparameterized in such a way 

E{x) = ^ji^j) + dij{xi,Xj) + Const, 
(jev) {i,j)e£ 

Oj{k) = min E{x\xj = k) — Eq, (23) 

Oi{l) + k) + 9j{k) = m.m.E{x\xi = I, xj = k) - Eq, (24) 

X 

where Eq = min^ E{x). 

Proof. We perform tree-based decomposition and make use of the fact that TRW converges to optimal 
value of energy (strong tree agreement) for submodular functions. 

Consider a decomposition of MRF into a set of trees T such that each edge (f, j) G f is covered by 
only one tree. It can be shown that each tree T can be reparameterized the following form [_ ;>, theorem 1]: 

e'^{k) = m:mE'^{x\xj = k)-ET, (25) 

X 

C(0 + ^lil k) + ej{k) = min E'^{x\xi = I, xj = k) - Et, (26) 

where E^ = min^S-^(x). Using the fact that for submodular functions we have strong tree agreement [14, 
theorem 3] we may write 

Eq = mixiE{x) = m\\iE'^{x) = Et- 
TeT TeT 

The same is ti"ue for conditional minimums since they are minimums of submodular functions 

min^ E{x\xj = k) as well: 

mSp.E{x\xj = k)=Y mmE'^{x\xj = A:) = ^ (^J(^) + ^t) = ^ Oj{k) + Eq. 
^ TeT TeT T-.x^eVr 
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Each {0^} defines energy E'^, hence their sum defines the initial energy E. 

ej{k)= dj{k) = mmE{x\xj = k) - Eq. 

T-.xjeVr ^ 

Similar reasoning is valid for the edges 

mill E{x\xi = l,Xj = k) = min£'"^(x|xj = l,Xj = k) = miiiE'^ {x\xi = l) + 

^ TeT ^ T-.x^gVt, Xj^Vr ^ 

+ uiinE'^ {x\xj = k) + mm E'^'^ {x\xi = l,Xj = k) = 

T-.xjeVr, Xi^Vr ^ ^ 

E {eJ{i) + ET)+ (eJ{k) + ET)+ef^{i)+ef^^{i,k)+ef{k)+ET,^ = 

= Yl ^^^^)+ E oJ{k) + ef.^{i,k) + E^ = ei{i) + e,{k) + e^,{Uk) + Eo, 

T-.x^GVt T-.XjGVt 

where Tij is the tree that covers edge ■ 
Now we prove theorem 9. Consider the change of a single \j with all other A's unchanged. Ac- 
cording to the previous lemma we can use the reparameterization defined by (23), (24). Let s = 



arg maxq 



0.(0) - 



Then 



If AXj > then 

miii$p(yp, A'^'^"') > min$p(yp, A"''^), Vp s 

and 

min$p(y„ A'^""') = min$p(y„ A"''^) + AA^. 
Last equation follows from the fact that ^^^(1) = and Zjs{A'^'''") = {0, 1}. Using the fact that 



p 

L{Y, A) = Y ^) - E 



p=l " jGV 

we obtain 

L(y, A"^'") > L(y, A"''^) + AAj - AAj = L(y, A"''^). 

Now consider the case when AAj < 0. Hence it follows that all 6jr{0) = and consequently 9js{l) = 
min-r ^jr(l). The decrease of Xj by AAj = — ^^^(1) still keeps {0} G Zjp for allp, i.e. 

min$p(yp,A"""') =min$p(yp,A°'"'), Vp. 

The lower bound increases because of the last term 

L(y, A"^'") = L(y, A"'"') - AXj > L(y, A°'"'). 

Now we prove that the fixed point of the iterative process always satisfies weak agreement condition. 
Let all AAj = for some value of A*^. Since maxg Ojq{0) — Ojq{l) = for all vertices j, we observe 
that 1 G Zjs and Vp : G Zjp, which imphes weak agreement condition. 
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